Skip to content

add dist matrix function and update tutorial#97

Merged
yemeen merged 5 commits intomainfrom
update-matisse
Feb 27, 2026
Merged

add dist matrix function and update tutorial#97
yemeen merged 5 commits intomainfrom
update-matisse

Conversation

@yemeen
Copy link
Member

@yemeen yemeen commented Feb 26, 2026

Pull Request

Summary

add dist matrix function and update tutorial

Description

uses previous dist command to automatically create a distance matrix

Motivation and Context

protein stuff!

How has this been tested?

ran make tests!

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • [ x ] New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ x ] Documentation update
  • Code refactoring
  • Performance improvement
  • Test improvement

Checklist

  • [ x ] My code follows the code style of this project. (make clean)
  • [ x ] I have incremented the version number in the pyproject.toml file (if applicable).
  • [ x ] I have added tests to cover my changes (if applicable).
  • [ x ] All new and existing tests passed. (make tests)
  • [ x ] I have updated the documentation (if applicable).
  • [ x ] My changes generate no new warnings.
  • [ x ] I have added any necessary new dependencies to pyproject.toml.

Additional Notes

Screenshots (if applicable)

@yemeen yemeen requested a review from lizliz as a code owner February 26, 2026 21:58
@lizliz lizliz requested review from Copilot and removed request for lizliz February 27, 2026 15:15
Copy link
Contributor

@lizliz lizliz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay!

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new dist_matrix classmethod to the ECTResult class for computing distance matrices between multiple ECT results, and updates the tutorial to demonstrate its usage. The PR also changes the default distance metric from "cityblock" to "frobenius" in the dist method and exports ECTResult in the package's __init__.py.

Changes:

  • Added dist_matrix classmethod to compute pairwise distance matrices between ECT results
  • Changed default distance metric from "cityblock" to "frobenius" in the dist method
  • Updated tutorial (example_matisse.ipynb) to demonstrate the new dist_matrix functionality
  • Exported ECTResult in __init__.py for easier imports
  • Incremented version from 1.2.3 to 1.2.4

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

File Description
src/ect/results.py Added dist_matrix classmethod, changed default metric to frobenius, implemented frobenius distance computation
tests/test_ect_result.py Updated tests to reflect the default metric change from cityblock to frobenius
src/ect/init.py Exported ECTResult class
pyproject.toml Incremented version from 1.2.3 to 1.2.4
Comments suppressed due to low confidence (4)

src/ect/results.py:402

  • The implementation of the Frobenius distance for the dist_matrix method is inefficient. For each row i, it calls results[i].dist(results, metric="frobenius"), which computes distances to all results including itself. This results in O(n²) distance computations even though a distance matrix is symmetric. A more efficient approach would be to compute only the upper triangular portion and mirror it, or use vectorized operations on the stacked data similar to how the string metrics are handled with pdist.
        if isinstance(metric, str) and metric.lower() in ("frobenius", "fro"):
            return np.vstack([results[i].dist(results, metric="frobenius") for i in range(len(results))])

src/ect/results.py:411

  • The new dist_matrix classmethod lacks test coverage. No tests have been added to verify its functionality, including edge cases such as empty input, single element lists, shape mismatches, different metrics (frobenius, cityblock, euclidean, custom callables), and that it produces a proper symmetric distance matrix with zeros on the diagonal. Given that other methods in this file have comprehensive test coverage in test_ect_result.py, this new method should follow the same pattern.
    @classmethod
    def dist_matrix(
        cls,
        results: List["ECTResult"],
        metric: Union[str, Callable] = "frobenius",
        **kwargs,
    ) -> np.ndarray:
        if not results:
            return np.empty((0, 0), dtype=np.float64)

        shape0 = results[0].shape
        for i, r in enumerate(results):
            if r.shape != shape0:
                raise ValueError(f"Shape mismatch at index {i}: {shape0} vs {r.shape}")

        if isinstance(metric, str) and metric.lower() in ("frobenius", "fro"):
            return np.vstack([results[i].dist(results, metric="frobenius") for i in range(len(results))])

        if isinstance(metric, str):
            X = np.stack([np.asarray(r, dtype=np.float64).ravel() for r in results], axis=0)
            try:
                return squareform(pdist(X, metric=metric, **kwargs))
            except TypeError:
                return cdist(X, X, metric=metric, **kwargs)

        return np.vstack([results[i].dist(results, metric=metric, **kwargs) for i in range(len(results))])

src/ect/results.py:411

  • The Frobenius distance implementation for callable metrics (line 411) may have inconsistent behavior. When the metric is a callable, it calls results[i].dist(results, metric=metric, **kwargs) for each i, which will pass through to the cdist code path that uses raveled 1D arrays. However, for Frobenius distance specifically, you're computing the matrix norm on 2D arrays (lines 368-375). This means custom callable distance functions will operate on flattened data while Frobenius operates on the full matrix structure. Consider whether this is the intended behavior or if there should be a note in the documentation about this distinction.
        return np.vstack([results[i].dist(results, metric=metric, **kwargs) for i in range(len(results))])

src/ect/results.py:411

  • The dist_matrix method is missing a docstring. This is a public classmethod that should have comprehensive documentation explaining its purpose, parameters, return value, and usage examples. The documentation should follow the same style as the dist method above, including details about the metric parameter, **kwargs, error conditions, and example usage.
    @classmethod
    def dist_matrix(
        cls,
        results: List["ECTResult"],
        metric: Union[str, Callable] = "frobenius",
        **kwargs,
    ) -> np.ndarray:
        if not results:
            return np.empty((0, 0), dtype=np.float64)

        shape0 = results[0].shape
        for i, r in enumerate(results):
            if r.shape != shape0:
                raise ValueError(f"Shape mismatch at index {i}: {shape0} vs {r.shape}")

        if isinstance(metric, str) and metric.lower() in ("frobenius", "fro"):
            return np.vstack([results[i].dist(results, metric="frobenius") for i in range(len(results))])

        if isinstance(metric, str):
            X = np.stack([np.asarray(r, dtype=np.float64).ravel() for r in results], axis=0)
            try:
                return squareform(pdist(X, metric=metric, **kwargs))
            except TypeError:
                return cdist(X, X, metric=metric, **kwargs)

        return np.vstack([results[i].dist(results, metric=metric, **kwargs) for i in range(len(results))])

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yemeen yemeen merged commit 5219ac3 into main Feb 27, 2026
7 checks passed
@yemeen yemeen deleted the update-matisse branch February 27, 2026 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants